Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 50000 |
| Missing cells | 45647 |
| Missing cells (%) | 6.5% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 3.7 MiB |
| Average record size in memory | 78.0 B |
Variable types
| NUM | 12 |
|---|---|
| BOOL | 1 |
| CAT | 1 |
engine_capacity has 30050 (60.1%) missing values | Missing |
damage has 8266 (16.5%) missing values | Missing |
insurance_price has 7331 (14.7%) missing values | Missing |
power is highly skewed (γ1 = 52.35179232) | Skewed |
Ind has unique values | Unique |
type has 4329 (8.7%) zeros | Zeros |
power has 4294 (8.6%) zeros | Zeros |
Reproduction
| Analysis started | 2020-11-03 12:47:49.335185 |
|---|---|
| Analysis finished | 2020-11-03 12:48:42.067431 |
| Duration | 52.73 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
| Distinct | 50000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 50259.7297 |
|---|---|
| Minimum | 0 |
| Maximum | 99999 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 5121.85 |
| Q1 | 25387.75 |
| median | 50349.5 |
| Q3 | 75231.75 |
| 95-th percentile | 95043.05 |
| Maximum | 99999 |
| Range | 99999 |
| Interquartile range (IQR) | 49844 |
Descriptive statistics
| Standard deviation | 28822.41975 |
|---|---|
| Coefficient of variation (CV) | 0.573469454 |
| Kurtosis | -1.196173129 |
| Mean | 50259.7297 |
| Median Absolute Deviation (MAD) | 24922 |
| Skewness | -0.01114460025 |
| Sum | 2512986485 |
| Variance | 830731880.1 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 2047 | 1 | < 0.1% | |
| 7401 | 1 | < 0.1% | |
| 7433 | 1 | < 0.1% | |
| 5384 | 1 | < 0.1% | |
| 27911 | 1 | < 0.1% | |
| 95492 | 1 | < 0.1% | |
| 85251 | 1 | < 0.1% | |
| 17666 | 1 | < 0.1% | |
| 5922 | 1 | < 0.1% | |
| 87296 | 1 | < 0.1% | |
| 27479 | 1 | < 0.1% | |
| 46332 | 1 | < 0.1% | |
| 34042 | 1 | < 0.1% | |
| 40185 | 1 | < 0.1% | |
| 22342 | 1 | < 0.1% | |
| 14533 | 1 | < 0.1% | |
| 87344 | 1 | < 0.1% | |
| 50418 | 1 | < 0.1% | |
| 8390 | 1 | < 0.1% | |
| 9454 | 1 | < 0.1% | |
| 81133 | 1 | < 0.1% | |
| 79084 | 1 | < 0.1% | |
| 3307 | 1 | < 0.1% | |
| 1290 | 1 | < 0.1% | |
| 68875 | 1 | < 0.1% | |
| Other values (49975) | 49975 | > 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 1 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% | |
| 5 | 1 | < 0.1% | |
| 6 | 1 | < 0.1% | |
| 7 | 1 | < 0.1% | |
| 8 | 1 | < 0.1% | |
| 16 | 1 | < 0.1% | |
| 17 | 1 | < 0.1% | |
| 18 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 99999 | 1 | < 0.1% | |
| 99995 | 1 | < 0.1% | |
| 99990 | 1 | < 0.1% | |
| 99987 | 1 | < 0.1% | |
| 99985 | 1 | < 0.1% | |
| 99983 | 1 | < 0.1% | |
| 99981 | 1 | < 0.1% | |
| 99980 | 1 | < 0.1% | |
| 99979 | 1 | < 0.1% | |
| 99978 | 1 | < 0.1% |
| Distinct | 80 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 30050 |
| Missing (%) | 60.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.867213033 |
|---|---|
| Minimum | 0 |
| Maximum | 9.5 |
| Zeros | 34 |
| Zeros (%) | 0.1% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1.1 |
| Q1 | 1.4 |
| median | 1.8 |
| Q3 | 2 |
| 95-th percentile | 3 |
| Maximum | 9.5 |
| Range | 9.5 |
| Interquartile range (IQR) | 0.6 |
Descriptive statistics
| Standard deviation | 0.808439743 |
|---|---|
| Coefficient of variation (CV) | 0.432965992 |
| Kurtosis | 29.4342974 |
| Mean | 1.867213033 |
| Median Absolute Deviation (MAD) | 0.2 |
| Skewness | 4.305145264 |
| Sum | 37250.9 |
| Variance | 0.653574818 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 2 | 4113 | 8.2% | |
| 1.6 | 2997 | 6.0% | |
| 1.4 | 2485 | 5.0% | |
| 1.2 | 1891 | 3.8% | |
| 1.9 | 1822 | 3.6% | |
| 1.8 | 1499 | 3.0% | |
| 3 | 783 | 1.6% | |
| 1 | 708 | 1.4% | |
| 2.2 | 579 | 1.2% | |
| 2.5 | 524 | 1.0% | |
| 1.3 | 405 | 0.8% | |
| 2.4 | 225 | 0.4% | |
| 1.1 | 218 | 0.4% | |
| 1.5 | 203 | 0.4% | |
| 1.7 | 200 | 0.4% | |
| 2.7 | 180 | 0.4% | |
| 2.8 | 164 | 0.3% | |
| 3.2 | 145 | 0.3% | |
| 2.3 | 111 | 0.2% | |
| 4.2 | 86 | 0.2% | |
| 2.6 | 56 | 0.1% | |
| 4 | 44 | 0.1% | |
| 0 | 34 | 0.1% | |
| 5 | 32 | 0.1% | |
| 5.2 | 26 | 0.1% | |
| Other values (55) | 420 | 0.8% | |
| (Missing) | 30050 | 60.1% |
| Value | Count | Frequency (%) | |
| 0 | 34 | 0.1% | |
| 0.1 | 8 | < 0.1% | |
| 0.2 | 16 | < 0.1% | |
| 0.3 | 1 | < 0.1% | |
| 0.4 | 2 | < 0.1% | |
| 0.7 | 1 | < 0.1% | |
| 0.8 | 14 | < 0.1% | |
| 0.9 | 15 | < 0.1% | |
| 1 | 708 | 1.4% | |
| 1.1 | 218 | 0.4% |
| Value | Count | Frequency (%) | |
| 9.5 | 1 | < 0.1% | |
| 9.4 | 1 | < 0.1% | |
| 9.3 | 4 | < 0.1% | |
| 9.2 | 25 | 0.1% | |
| 9.1 | 7 | < 0.1% | |
| 9 | 12 | < 0.1% | |
| 8.9 | 1 | < 0.1% | |
| 8.6 | 1 | < 0.1% | |
| 8.5 | 4 | < 0.1% | |
| 8.3 | 1 | < 0.1% |
| Distinct | 8 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.04 |
|---|---|
| Minimum | -1 |
| Maximum | 6 |
| Zeros | 4329 |
| Zeros (%) | 8.7% |
| Memory size | 49.0 KiB |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | -1 |
| Q1 | 1 |
| median | 3 |
| Q3 | 5 |
| 95-th percentile | 6 |
| Maximum | 6 |
| Range | 7 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 2.370745117 |
|---|---|
| Coefficient of variation (CV) | 0.7798503674 |
| Kurtosis | -1.118899507 |
| Mean | 3.04 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.3644499003 |
| Sum | 152000 |
| Variance | 5.620432409 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 3 | 13392 | 26.8% | |
| 5 | 10270 | 20.5% | |
| 6 | 9423 | 18.8% | |
| -1 | 6254 | 12.5% | |
| 0 | 4329 | 8.7% | |
| 1 | 3284 | 6.6% | |
| 2 | 2643 | 5.3% | |
| 4 | 405 | 0.8% |
| Value | Count | Frequency (%) | |
| -1 | 6254 | 12.5% | |
| 0 | 4329 | 8.7% | |
| 1 | 3284 | 6.6% | |
| 2 | 2643 | 5.3% | |
| 3 | 13392 | 26.8% | |
| 4 | 405 | 0.8% | |
| 5 | 10270 | 20.5% | |
| 6 | 9423 | 18.8% |
| Value | Count | Frequency (%) | |
| 6 | 9423 | 18.8% | |
| 5 | 10270 | 20.5% | |
| 4 | 405 | 0.8% | |
| 3 | 13392 | 26.8% | |
| 2 | 2643 | 5.3% | |
| 1 | 3284 | 6.6% | |
| 0 | 4329 | 8.7% | |
| -1 | 6254 | 12.5% |
registration_year
Real number (ℝ≥0)
| Distinct | 120 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1839.1952 |
|---|---|
| Minimum | 0 |
| Maximum | 2016 |
| Zeros | 229 |
| Zeros (%) | 0.5% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 11 |
| Q1 | 1998 |
| median | 2003 |
| Q3 | 2008 |
| 95-th percentile | 2016 |
| Maximum | 2016 |
| Range | 2016 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 545.9742433 |
|---|---|
| Coefficient of variation (CV) | 0.296854974 |
| Kurtosis | 7.106242715 |
| Mean | 1839.1952 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | -3.016393557 |
| Sum | 91959760 |
| Variance | 298087.8743 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 2005 | 2910 | 5.8% | |
| 2016 | 2790 | 5.6% | |
| 2006 | 2731 | 5.5% | |
| 2000 | 2670 | 5.3% | |
| 1999 | 2654 | 5.3% | |
| 2003 | 2633 | 5.3% | |
| 2004 | 2617 | 5.2% | |
| 2001 | 2608 | 5.2% | |
| 2002 | 2529 | 5.1% | |
| 2007 | 2365 | 4.7% | |
| 2008 | 2200 | 4.4% | |
| 2009 | 2094 | 4.2% | |
| 1998 | 1990 | 4.0% | |
| 2010 | 1707 | 3.4% | |
| 2011 | 1651 | 3.3% | |
| 1997 | 1539 | 3.1% | |
| 2012 | 1307 | 2.6% | |
| 1996 | 1121 | 2.2% | |
| 2013 | 847 | 1.7% | |
| 1995 | 814 | 1.6% | |
| 2014 | 650 | 1.3% | |
| 1994 | 481 | 1.0% | |
| 2015 | 388 | 0.8% | |
| 1992 | 356 | 0.7% | |
| 1993 | 327 | 0.7% | |
| Other values (95) | 6021 | 12.0% |
| Value | Count | Frequency (%) | |
| 0 | 229 | 0.5% | |
| 1 | 222 | 0.4% | |
| 2 | 206 | 0.4% | |
| 3 | 253 | 0.5% | |
| 4 | 240 | 0.5% | |
| 5 | 254 | 0.5% | |
| 6 | 224 | 0.4% | |
| 7 | 228 | 0.5% | |
| 8 | 197 | 0.4% | |
| 9 | 194 | 0.4% |
| Value | Count | Frequency (%) | |
| 2016 | 2790 | 5.6% | |
| 2015 | 388 | 0.8% | |
| 2014 | 650 | 1.3% | |
| 2013 | 847 | 1.7% | |
| 2012 | 1307 | 2.6% | |
| 2011 | 1651 | 3.3% | |
| 2010 | 1707 | 3.4% | |
| 2009 | 2094 | 4.2% | |
| 2008 | 2200 | 4.4% | |
| 2007 | 2365 | 4.7% |
gearbox
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 49.0 KiB |
| 1 | |
|---|---|
| 0 | |
| -1 | 2041 |
| Value | Count | Frequency (%) | |
| 1 | 37008 | 74.0% | |
| 0 | 10951 | 21.9% | |
| -1 | 2041 | 4.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 2 |
|---|---|
| Median length | 1 |
| Mean length | 1.04082 |
| Min length | 1 |
Most occurring characters
| Value | Count | Frequency (%) | |
| 1 | 39049 | 75.0% | |
| 0 | 10951 | 21.0% | |
| - | 2041 | 3.9% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Decimal Number | 50000 | 96.1% | |
| Dash Punctuation | 2041 | 3.9% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 1 | 39049 | 78.1% | |
| 0 | 10951 | 21.9% |
Most frequent Dash Punctuation characters
| Value | Count | Frequency (%) | |
| - | 2041 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Common | 52041 | 100.0% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 1 | 39049 | 75.0% | |
| 0 | 10951 | 21.0% | |
| - | 2041 | 3.9% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 52041 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| 1 | 39049 | 75.0% | |
| 0 | 10951 | 21.0% | |
| - | 2041 | 3.9% |
| Distinct | 452 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 121.10506 |
|---|---|
| Minimum | 0 |
| Maximum | 16311 |
| Zeros | 4294 |
| Zeros (%) | 8.6% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 75 |
| median | 110 |
| Q3 | 150 |
| 95-th percentile | 233 |
| Maximum | 16311 |
| Range | 16311 |
| Interquartile range (IQR) | 75 |
Descriptive statistics
| Standard deviation | 188.7879379 |
|---|---|
| Coefficient of variation (CV) | 1.558877374 |
| Kurtosis | 3513.072746 |
| Mean | 121.10506 |
| Median Absolute Deviation (MAD) | 37 |
| Skewness | 52.35179232 |
| Sum | 6055253 |
| Variance | 35640.88548 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 4294 | 8.6% | |
| 75 | 3162 | 6.3% | |
| 150 | 2196 | 4.4% | |
| 140 | 1966 | 3.9% | |
| 60 | 1953 | 3.9% | |
| 101 | 1907 | 3.8% | |
| 116 | 1721 | 3.4% | |
| 90 | 1667 | 3.3% | |
| 170 | 1598 | 3.2% | |
| 105 | 1514 | 3.0% | |
| 136 | 973 | 1.9% | |
| 125 | 949 | 1.9% | |
| 163 | 909 | 1.8% | |
| 102 | 890 | 1.8% | |
| 143 | 842 | 1.7% | |
| 122 | 798 | 1.6% | |
| 131 | 768 | 1.5% | |
| 54 | 707 | 1.4% | |
| 110 | 673 | 1.3% | |
| 109 | 656 | 1.3% | |
| 120 | 590 | 1.2% | |
| 80 | 563 | 1.1% | |
| 50 | 558 | 1.1% | |
| 177 | 547 | 1.1% | |
| 58 | 537 | 1.1% | |
| Other values (427) | 17062 | 34.1% |
| Value | Count | Frequency (%) | |
| 0 | 4294 | 8.6% | |
| 1 | 1 | < 0.1% | |
| 2 | 3 | < 0.1% | |
| 4 | 3 | < 0.1% | |
| 5 | 14 | < 0.1% | |
| 6 | 5 | < 0.1% | |
| 7 | 3 | < 0.1% | |
| 8 | 2 | < 0.1% | |
| 9 | 2 | < 0.1% | |
| 10 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 16311 | 1 | < 0.1% | |
| 13636 | 1 | < 0.1% | |
| 13616 | 1 | < 0.1% | |
| 12512 | 1 | < 0.1% | |
| 12012 | 1 | < 0.1% | |
| 11530 | 1 | < 0.1% | |
| 11011 | 1 | < 0.1% | |
| 8500 | 1 | < 0.1% | |
| 7511 | 1 | < 0.1% | |
| 6226 | 1 | < 0.1% |
model
Real number (ℝ)
| Distinct | 248 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 94.90192 |
|---|---|
| Minimum | -1 |
| Maximum | 246 |
| Zeros | 61 |
| Zeros (%) | 0.1% |
| Memory size | 97.8 KiB |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 31 |
| median | 84 |
| Q3 | 154 |
| 95-th percentile | 224 |
| Maximum | 246 |
| Range | 247 |
| Interquartile range (IQR) | 123 |
Descriptive statistics
| Standard deviation | 72.46360852 |
|---|---|
| Coefficient of variation (CV) | 0.7635631452 |
| Kurtosis | -0.9795563456 |
| Mean | 94.90192 |
| Median Absolute Deviation (MAD) | 55 |
| Skewness | 0.4563148456 |
| Sum | 4745096 |
| Variance | 5250.97456 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 117 | 4121 | 8.2% | |
| 40 | 3541 | 7.1% | |
| 11 | 2927 | 5.9% | |
| -1 | 2257 | 4.5% | |
| 172 | 1627 | 3.3% | |
| 84 | 1486 | 3.0% | |
| 29 | 1451 | 2.9% | |
| 169 | 1431 | 2.9% | |
| 43 | 1406 | 2.8% | |
| 60 | 1235 | 2.5% | |
| 15 | 1192 | 2.4% | |
| 97 | 1099 | 2.2% | |
| 28 | 985 | 2.0% | |
| 221 | 836 | 1.7% | |
| 104 | 835 | 1.7% | |
| 31 | 834 | 1.7% | |
| 8 | 698 | 1.4% | |
| 103 | 681 | 1.4% | |
| 224 | 611 | 1.2% | |
| 107 | 585 | 1.2% | |
| 33 | 581 | 1.2% | |
| 6 | 575 | 1.1% | |
| 231 | 508 | 1.0% | |
| 219 | 486 | 1.0% | |
| 246 | 483 | 1.0% | |
| Other values (223) | 17529 | 35.1% |
| Value | Count | Frequency (%) | |
| -1 | 2257 | 4.5% | |
| 0 | 61 | 0.1% | |
| 1 | 5 | < 0.1% | |
| 2 | 72 | 0.1% | |
| 3 | 81 | 0.2% | |
| 4 | 31 | 0.1% | |
| 5 | 160 | 0.3% | |
| 6 | 575 | 1.1% | |
| 7 | 4 | < 0.1% | |
| 8 | 698 | 1.4% |
| Value | Count | Frequency (%) | |
| 246 | 483 | 1.0% | |
| 245 | 121 | 0.2% | |
| 244 | 33 | 0.1% | |
| 243 | 21 | < 0.1% | |
| 242 | 147 | 0.3% | |
| 241 | 56 | 0.1% | |
| 240 | 27 | 0.1% | |
| 239 | 26 | 0.1% | |
| 238 | 332 | 0.7% | |
| 237 | 35 | 0.1% |
mileage
Real number (ℝ≥0)
| Distinct | 13 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 125206.2 |
|---|---|
| Minimum | 5000 |
| Maximum | 150000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 5000 |
|---|---|
| 5-th percentile | 40000 |
| Q1 | 100000 |
| median | 150000 |
| Q3 | 150000 |
| 95-th percentile | 150000 |
| Maximum | 150000 |
| Range | 145000 |
| Interquartile range (IQR) | 50000 |
Descriptive statistics
| Standard deviation | 39587.83684 |
|---|---|
| Coefficient of variation (CV) | 0.3161811223 |
| Kurtosis | 0.9775977899 |
| Mean | 125206.2 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -1.474903011 |
| Sum | 6260310000 |
| Variance | 1567196825 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 150000 | 31784 | 63.6% | |
| 125000 | 5323 | 10.6% | |
| 100000 | 2191 | 4.4% | |
| 90000 | 1820 | 3.6% | |
| 80000 | 1625 | 3.2% | |
| 70000 | 1367 | 2.7% | |
| 60000 | 1308 | 2.6% | |
| 50000 | 1109 | 2.2% | |
| 40000 | 976 | 2.0% | |
| 20000 | 807 | 1.6% | |
| 30000 | 800 | 1.6% | |
| 5000 | 653 | 1.3% | |
| 10000 | 237 | 0.5% |
| Value | Count | Frequency (%) | |
| 5000 | 653 | 1.3% | |
| 10000 | 237 | 0.5% | |
| 20000 | 807 | 1.6% | |
| 30000 | 800 | 1.6% | |
| 40000 | 976 | 2.0% | |
| 50000 | 1109 | 2.2% | |
| 60000 | 1308 | 2.6% | |
| 70000 | 1367 | 2.7% | |
| 80000 | 1625 | 3.2% | |
| 90000 | 1820 | 3.6% |
| Value | Count | Frequency (%) | |
| 150000 | 31784 | 63.6% | |
| 125000 | 5323 | 10.6% | |
| 100000 | 2191 | 4.4% | |
| 90000 | 1820 | 3.6% | |
| 80000 | 1625 | 3.2% | |
| 70000 | 1367 | 2.7% | |
| 60000 | 1308 | 2.6% | |
| 50000 | 1109 | 2.2% | |
| 40000 | 976 | 2.0% | |
| 30000 | 800 | 1.6% |
fuel
Real number (ℝ)
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.48458 |
|---|---|
| Minimum | -1 |
| Maximum | 4 |
| Zeros | 87 |
| Zeros (%) | 0.2% |
| Memory size | 49.0 KiB |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | -1 |
| Q1 | 1 |
| median | 2 |
| Q3 | 2 |
| 95-th percentile | 2 |
| Maximum | 4 |
| Range | 5 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.8465793274 |
|---|---|
| Coefficient of variation (CV) | 0.5702483715 |
| Kurtosis | 2.63990937 |
| Mean | 1.48458 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -1.645279202 |
| Sum | 74229 |
| Variance | 0.7166965575 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 2 | 29868 | 59.7% | |
| 1 | 15665 | 31.3% | |
| -1 | 3583 | 7.2% | |
| 3 | 777 | 1.6% | |
| 0 | 87 | 0.2% | |
| 4 | 20 | < 0.1% |
| Value | Count | Frequency (%) | |
| -1 | 3583 | 7.2% | |
| 0 | 87 | 0.2% | |
| 1 | 15665 | 31.3% | |
| 2 | 29868 | 59.7% | |
| 3 | 777 | 1.6% | |
| 4 | 20 | < 0.1% |
| Value | Count | Frequency (%) | |
| 4 | 20 | < 0.1% | |
| 3 | 777 | 1.6% | |
| 2 | 29868 | 59.7% | |
| 1 | 15665 | 31.3% | |
| 0 | 87 | 0.2% | |
| -1 | 3583 | 7.2% |
brand
Real number (ℝ≥0)
| Distinct | 40 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 20.53908 |
|---|---|
| Minimum | 0 |
| Maximum | 39 |
| Zeros | 314 |
| Zeros (%) | 0.6% |
| Memory size | 49.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 9 |
| median | 23 |
| Q3 | 33 |
| 95-th percentile | 38 |
| Maximum | 39 |
| Range | 39 |
| Interquartile range (IQR) | 24 |
Descriptive statistics
| Standard deviation | 13.45816983 |
|---|---|
| Coefficient of variation (CV) | 0.6552469649 |
| Kurtosis | -1.342794143 |
| Mean | 20.53908 |
| Median Absolute Deviation (MAD) | 13 |
| Skewness | -0.1378992627 |
| Sum | 1026954 |
| Variance | 181.1223352 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 38 | 10810 | 21.6% | |
| 2 | 5696 | 11.4% | |
| 24 | 5087 | 10.2% | |
| 20 | 4977 | 10.0% | |
| 1 | 4640 | 9.3% | |
| 10 | 3184 | 6.4% | |
| 27 | 2241 | 4.5% | |
| 25 | 1517 | 3.0% | |
| 9 | 1186 | 2.4% | |
| 30 | 963 | 1.9% | |
| 31 | 803 | 1.6% | |
| 19 | 744 | 1.5% | |
| 32 | 712 | 1.4% | |
| 5 | 696 | 1.4% | |
| 23 | 689 | 1.4% | |
| 36 | 626 | 1.3% | |
| 12 | 539 | 1.1% | |
| 21 | 529 | 1.1% | |
| 33 | 482 | 1.0% | |
| 39 | 434 | 0.9% | |
| 22 | 378 | 0.8% | |
| 15 | 362 | 0.7% | |
| 11 | 360 | 0.7% | |
| 0 | 314 | 0.6% | |
| 26 | 304 | 0.6% | |
| Other values (15) | 1727 | 3.5% |
| Value | Count | Frequency (%) | |
| 0 | 314 | 0.6% | |
| 1 | 4640 | 9.3% | |
| 2 | 5696 | 11.4% | |
| 3 | 258 | 0.5% | |
| 4 | 175 | 0.4% | |
| 5 | 696 | 1.4% | |
| 6 | 138 | 0.3% | |
| 7 | 54 | 0.1% | |
| 8 | 99 | 0.2% | |
| 9 | 1186 | 2.4% |
| Value | Count | Frequency (%) | |
| 39 | 434 | 0.9% | |
| 38 | 10810 | 21.6% | |
| 37 | 54 | 0.1% | |
| 36 | 626 | 1.3% | |
| 35 | 272 | 0.5% | |
| 34 | 102 | 0.2% | |
| 33 | 482 | 1.0% | |
| 32 | 712 | 1.4% | |
| 31 | 803 | 1.6% | |
| 30 | 963 | 1.9% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 8266 |
| Missing (%) | 16.5% |
| Memory size | 390.8 KiB |
| 0 | |
|---|---|
| 1 | |
| (Missing) |
| Value | Count | Frequency (%) | |
| 0 | 37721 | 75.4% | |
| 1 | 4013 | 8.0% | |
| (Missing) | 8266 | 16.5% |
zipcode
Real number (ℝ≥0)
| Distinct | 7002 |
|---|---|
| Distinct (%) | 14.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 51436.40392 |
|---|---|
| Minimum | 1067 |
| Maximum | 99998 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 1067 |
|---|---|
| 5-th percentile | 10117 |
| Q1 | 30989 |
| median | 50374 |
| Q3 | 72415 |
| 95-th percentile | 93138 |
| Maximum | 99998 |
| Range | 98931 |
| Interquartile range (IQR) | 41426 |
Descriptive statistics
| Standard deviation | 25808.98566 |
|---|---|
| Coefficient of variation (CV) | 0.5017649698 |
| Kurtosis | -0.9866818816 |
| Mean | 51436.40392 |
| Median Absolute Deviation (MAD) | 20734 |
| Skewness | 0.03418460102 |
| Sum | 2571820196 |
| Variance | 666103740.7 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 10115 | 121 | 0.2% | |
| 65428 | 74 | 0.1% | |
| 40764 | 50 | 0.1% | |
| 44145 | 49 | 0.1% | |
| 60311 | 49 | 0.1% | |
| 52525 | 47 | 0.1% | |
| 66333 | 47 | 0.1% | |
| 32257 | 46 | 0.1% | |
| 13357 | 45 | 0.1% | |
| 65719 | 45 | 0.1% | |
| 60386 | 44 | 0.1% | |
| 53757 | 43 | 0.1% | |
| 50354 | 43 | 0.1% | |
| 61169 | 41 | 0.1% | |
| 76437 | 41 | 0.1% | |
| 77933 | 41 | 0.1% | |
| 65549 | 41 | 0.1% | |
| 31275 | 40 | 0.1% | |
| 90763 | 40 | 0.1% | |
| 41334 | 40 | 0.1% | |
| 51065 | 39 | 0.1% | |
| 65929 | 39 | 0.1% | |
| 47877 | 39 | 0.1% | |
| 56564 | 38 | 0.1% | |
| 65232 | 37 | 0.1% | |
| Other values (6977) | 48821 | 97.6% |
| Value | Count | Frequency (%) | |
| 1067 | 16 | < 0.1% | |
| 1069 | 7 | < 0.1% | |
| 1097 | 4 | < 0.1% | |
| 1099 | 7 | < 0.1% | |
| 1108 | 1 | < 0.1% | |
| 1109 | 10 | < 0.1% | |
| 1127 | 3 | < 0.1% | |
| 1129 | 7 | < 0.1% | |
| 1139 | 10 | < 0.1% | |
| 1156 | 9 | < 0.1% |
| Value | Count | Frequency (%) | |
| 99998 | 3 | < 0.1% | |
| 99996 | 1 | < 0.1% | |
| 99988 | 1 | < 0.1% | |
| 99986 | 1 | < 0.1% | |
| 99976 | 3 | < 0.1% | |
| 99974 | 13 | < 0.1% | |
| 99955 | 3 | < 0.1% | |
| 99947 | 14 | < 0.1% | |
| 99897 | 4 | < 0.1% | |
| 99894 | 4 | < 0.1% |
| Distinct | 501 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 7331 |
| Missing (%) | 14.7% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 421.3452389 |
|---|---|
| Minimum | 10 |
| Maximum | 38960 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 10 |
|---|---|
| 5-th percentile | 50 |
| Q1 | 100 |
| median | 230 |
| Q3 | 510 |
| 95-th percentile | 1370 |
| Maximum | 38960 |
| Range | 38950 |
| Interquartile range (IQR) | 410 |
Descriptive statistics
| Standard deviation | 679.4443592 |
|---|---|
| Coefficient of variation (CV) | 1.612559717 |
| Kurtosis | 637.4215876 |
| Mean | 421.3452389 |
| Median Absolute Deviation (MAD) | 150 |
| Skewness | 16.00350852 |
| Sum | 17978380 |
| Variance | 461644.6373 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 70 | 1963 | 3.9% | |
| 60 | 1838 | 3.7% | |
| 80 | 1700 | 3.4% | |
| 90 | 1425 | 2.9% | |
| 50 | 1372 | 2.7% | |
| 100 | 1269 | 2.5% | |
| 110 | 1058 | 2.1% | |
| 140 | 1007 | 2.0% | |
| 120 | 992 | 2.0% | |
| 130 | 976 | 2.0% | |
| 150 | 925 | 1.8% | |
| 160 | 842 | 1.7% | |
| 170 | 833 | 1.7% | |
| 180 | 746 | 1.5% | |
| 40 | 739 | 1.5% | |
| 190 | 712 | 1.4% | |
| 210 | 677 | 1.4% | |
| 200 | 658 | 1.3% | |
| 230 | 638 | 1.3% | |
| 240 | 598 | 1.2% | |
| 220 | 593 | 1.2% | |
| 250 | 586 | 1.2% | |
| 260 | 506 | 1.0% | |
| 270 | 501 | 1.0% | |
| 290 | 478 | 1.0% | |
| Other values (476) | 19037 | 38.1% | |
| (Missing) | 7331 | 14.7% |
| Value | Count | Frequency (%) | |
| 10 | 121 | 0.2% | |
| 20 | 192 | 0.4% | |
| 30 | 324 | 0.6% | |
| 40 | 739 | 1.5% | |
| 50 | 1372 | 2.7% | |
| 60 | 1838 | 3.7% | |
| 70 | 1963 | 3.9% | |
| 80 | 1700 | 3.4% | |
| 90 | 1425 | 2.9% | |
| 100 | 1269 | 2.5% |
| Value | Count | Frequency (%) | |
| 38960 | 1 | < 0.1% | |
| 37620 | 1 | < 0.1% | |
| 29520 | 1 | < 0.1% | |
| 22470 | 1 | < 0.1% | |
| 22040 | 1 | < 0.1% | |
| 16590 | 1 | < 0.1% | |
| 16410 | 1 | < 0.1% | |
| 16060 | 1 | < 0.1% | |
| 13250 | 1 | < 0.1% | |
| 13080 | 1 | < 0.1% |
price
Real number (ℝ≥0)
| Distinct | 2331 |
|---|---|
| Distinct (%) | 4.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5746.90438 |
|---|---|
| Minimum | 455 |
| Maximum | 163800 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 455 |
|---|---|
| 5-th percentile | 591 |
| Q1 | 1365 |
| median | 3185 |
| Q3 | 7270 |
| 95-th percentile | 18655 |
| Maximum | 163800 |
| Range | 163345 |
| Interquartile range (IQR) | 5905 |
Descriptive statistics
| Standard deviation | 7688.683102 |
|---|---|
| Coefficient of variation (CV) | 1.337882553 |
| Kurtosis | 52.77745738 |
| Mean | 5746.90438 |
| Median Absolute Deviation (MAD) | 2184 |
| Skewness | 5.134366152 |
| Sum | 287345219 |
| Variance | 59115847.84 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 1365 | 814 | 1.6% | |
| 455 | 804 | 1.6% | |
| 1092 | 718 | 1.4% | |
| 910 | 699 | 1.4% | |
| 2275 | 660 | 1.3% | |
| 728 | 578 | 1.2% | |
| 546 | 554 | 1.1% | |
| 3185 | 547 | 1.1% | |
| 909 | 513 | 1.0% | |
| 1820 | 476 | 1.0% | |
| 773 | 474 | 0.9% | |
| 682 | 470 | 0.9% | |
| 637 | 461 | 0.9% | |
| 864 | 441 | 0.9% | |
| 819 | 440 | 0.9% | |
| 591 | 429 | 0.9% | |
| 4095 | 425 | 0.9% | |
| 1638 | 422 | 0.8% | |
| 1456 | 416 | 0.8% | |
| 2002 | 416 | 0.8% | |
| 1001 | 412 | 0.8% | |
| 2730 | 402 | 0.8% | |
| 1183 | 395 | 0.8% | |
| 500 | 384 | 0.8% | |
| 1137 | 381 | 0.8% | |
| Other values (2306) | 37269 | 74.5% |
| Value | Count | Frequency (%) | |
| 455 | 804 | 1.6% | |
| 464 | 1 | < 0.1% | |
| 467 | 1 | < 0.1% | |
| 473 | 7 | < 0.1% | |
| 477 | 4 | < 0.1% | |
| 482 | 9 | < 0.1% | |
| 483 | 1 | < 0.1% | |
| 486 | 1 | < 0.1% | |
| 491 | 3 | < 0.1% | |
| 499 | 13 | < 0.1% |
| Value | Count | Frequency (%) | |
| 163800 | 1 | < 0.1% | |
| 159250 | 1 | < 0.1% | |
| 158340 | 1 | < 0.1% | |
| 154206 | 1 | < 0.1% | |
| 141050 | 1 | < 0.1% | |
| 138310 | 1 | < 0.1% | |
| 136500 | 1 | < 0.1% | |
| 135590 | 1 | < 0.1% | |
| 125489 | 1 | < 0.1% | |
| 123305 | 1 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| Ind | engine_capacity | type | registration_year | gearbox | power | model | mileage | fuel | brand | damage | zipcode | insurance_price | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 48298 | 2.0 | 0 | 2006 | 0 | 140 | 58 | 150000 | 2 | 5 | 0.0 | 49191 | 380.0 | 4267 |
| 1 | 81047 | NaN | -1 | 2016 | -1 | 0 | 234 | 150000 | -1 | 20 | NaN | 45896 | NaN | 2457 |
| 2 | 92754 | 2.2 | 3 | 2010 | 1 | 175 | 154 | 125000 | 1 | 10 | 0.0 | 59229 | 930.0 | 10374 |
| 3 | 46007 | NaN | -1 | 2000 | 0 | 265 | 40 | 150000 | 2 | 10 | 0.0 | 39365 | 680.0 | 7098 |
| 4 | 76981 | NaN | 1 | 3 | 1 | 109 | 8 | 150000 | 2 | 25 | 0.0 | 55271 | NaN | 2365 |
| 5 | 9651 | NaN | 6 | 1999 | 0 | 122 | 60 | 150000 | 2 | 20 | 0.0 | 28195 | 80.0 | 1415 |
| 6 | 43085 | NaN | 6 | 1999 | 0 | 165 | 31 | 150000 | 2 | 1 | NaN | 73734 | 60.0 | 1091 |
| 7 | 94244 | NaN | -1 | 2016 | -1 | 55 | 224 | 125000 | 2 | 27 | 0.0 | 86470 | NaN | 1228 |
| 8 | 74568 | 1.6 | -1 | 2016 | 1 | 105 | 117 | 150000 | 2 | 38 | NaN | 79268 | 30.0 | 2275 |
| 9 | 20894 | 1.8 | 3 | 2005 | 1 | 116 | 173 | 150000 | 2 | 23 | 0.0 | 23554 | 330.0 | 4732 |
Last rows
| Ind | engine_capacity | type | registration_year | gearbox | power | model | mileage | fuel | brand | damage | zipcode | insurance_price | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 49990 | 30715 | NaN | 5 | 2004 | 1 | 65 | 242 | 125000 | 2 | 36 | 0.0 | 37269 | 170.0 | 1820 |
| 49991 | 79497 | NaN | 3 | 1994 | 1 | 102 | 99 | 150000 | 2 | 10 | 0.0 | 15518 | NaN | 500 |
| 49992 | 22451 | NaN | 1 | 2008 | 1 | 150 | 8 | 60000 | 2 | 25 | 0.0 | 54317 | 440.0 | 6643 |
| 49993 | 58497 | NaN | -1 | 2016 | 1 | 85 | 11 | 150000 | 2 | 2 | 0.0 | 66333 | NaN | 2002 |
| 49994 | 2397 | 1.6 | -1 | 2010 | 1 | 105 | 96 | 90000 | 2 | 6 | 0.0 | 61184 | 290.0 | 6961 |
| 49995 | 50429 | 1.4 | 3 | 2006 | 1 | 75 | 117 | 90000 | 2 | 38 | 0.0 | 35745 | 500.0 | 4686 |
| 49996 | 64425 | 1.3 | 5 | 4 | 1 | 60 | 103 | 150000 | 2 | 10 | 0.0 | 60386 | NaN | 864 |
| 49997 | 90761 | NaN | 3 | 1996 | 1 | 150 | 15 | 150000 | 2 | 2 | 0.0 | 28309 | 130.0 | 2275 |
| 49998 | 39709 | NaN | 3 | 2007 | 1 | 122 | 6 | 100000 | 1 | 2 | 0.0 | 83623 | 500.0 | 8144 |
| 49999 | 25524 | NaN | -1 | 1996 | 1 | 0 | 117 | 150000 | -1 | 38 | 0.0 | 26789 | 220.0 | 1592 |